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Abstract 

The simulation of the continuation of a given time series is useful for many practical appli- 
cations. But no standard procedure for this task is suggested in the literature. It is therefore 
demonstrated how to use the seasonal ARIMA process to simulate the continuation of an ob- 
served time series. The R-code presented uses well-known modeling procedures for ARIMA 
models and conditional simulation of a S ARIMA model with known parameters. A small ex- 
ample demonstrates the correctness and practical relevance of the new idea. 

seasonal ARIMA, simulation, R 

1 Introduction 

In many application areas it is of practical interest to be able to simulate the possible continuation 
of a given time series. For example in finance the simulation of future stock prices is a well-known 
standard method used for risk assessment and for option pricing. In inventory management the 
simulation of future demands could be used to compare the performance of different inventory 
policies. Clearly many other examples of applications in different areas are possible. When we 
wanted to quantify the risk of a company due to the uncertainty of the demand in the next months 
we tried to find code that simulates random future observations of the demand subject to the 
SARIMA (Seasonal Autoregressive Integrated Moving Average) model we had fitted to the data. 
We were astonished when we realized that we were not able to find a single paper in the literature 
that tackles this problem. It is clear that such a simulation conditional on given data requires 
a modeling and a parameter estimation step. These two steps are also required for forecasting. 
Seasonal (and non-seasonal) ARIMA models have been considered as standard procedures for many 
years (jlj) and are described in many text books (see eg. |S]). After selecting an ARIMA model 
and the estimation of its parameters only the conditional simulation of future realizations given the 
observations is required. Many software packages (including R ([6J)) contain functions to simulate 
realizations of ARIMA processes. But we were not able to find any description or implementation 
of a "simulation conditional on the observed values" for ARIMA models in the literature. 
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We therefore present our simple idea of conditional simulating an SARIMA process in Section[2] 
Section [3] contains our R-implementation whereas Section |4] demonstrates the application of our 
code for a practical example. 

2 SARIMA processes 

In R, the notation used for ARMA(p, q) processes is 

Xt = (piXt-i + ... + 4>pXt-p + et + OiEt-i + ... + Oqet-q + ^l (1) 

where (pi denotes the parameters of the autoregressive process, 6j the parameters of the moving 
average process and e the white noise error terms with standard deviation a following the normal 
distribution. Using the well known backshift operator notation we can rewrite the above definition 
by 

(p{B)Xt = e{B)et + iJi, 

where (l){B) denotes a backshift polynomial of order p and 9{B) a backshift polynomial of order q. 
An ARIMA(j3, d, q) process Xt is a process whose d-ih. difference 

V^Xt = [l-BfXt 

is an ARMA(|?, q) process. An ARIMA(p, d, q) process is thus defined by the equation: 

4>{B) V^Xt = e{B) et + fi. 

For seasonal ARIMA (SARIMA) processes with period s, a seasonal AR polynomial <I>(i3**) of order 
p, a seasonal MA polynomial QB'^ of order q and the seasonal difference operator of order d 

ViXt = {l-B'fXt 
are required. The SARIMA(p, d, q){p, d, q)s is then defined by the equation: 

^B') 4>{B) vfXt = @{B') e{B) et + ii. (2) 

For given observations the selection of the model orders p, d, q,p, d, q for a SARIMA model is 
the topic of many books on time series analysis and without the scope of this note. The estimation 
of the parameters is easy using the arima() function of the R-base stats package. We now assume 
that the observed time series xi,X2, ■ ■ ■ ,xt is a realization of the stochastic process defined by all 
model assumptions of the SARIMA model and its estimated parameters. We now consider the 
next m observations of the time series X = (Xt+i, Xt+2, ■ ■ ■ ,Xt+m) that are not known yet. The 
distribution of that random vector conditional on the observed values xi,X2, ■ ■ ■ ,xt is multi- normal 
and can be called conditional distribution of the future observation. The forecasts of the SARIMA 
model for the given time series are the expectations of the one- dimensional marginals of that 
conditional distribution and we write for example 

Xt+2 = E{Xt + 2\xi,X2, ... ,xt) . 

The conditional standard deviations of the one-dimensional marginals are used to calculate the pre- 
diction error. Exactly these conditional expectations and variances are calculated by the predict() 
function. 
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3 Conditional simulation of a SARIMA process 



The new idea of this note is now the suggestion to provide code that generates random reahzations 
of the future observation vector conditional on the observed observations. We can write 

X\xi,X2, . ..,Xt = {Xt+i,Xt+2, ■ ■ ■ ,Xt+m)\xi,X2, ■,Xt 

for that future observation vector and we hope that the presentation above made clear, why we 
can call a realization of that random vector a random continuation of the time series data we 
have observed. As the distribution of the vector is multi-normal it would be possible to generate 
from that distribution calculating its mean vector and variance-covariance matrix. But it is much 
easier to use directly the recursion of the model equation of the ARMA model: To generate a 
random realization of Xt+i conditional on the past we need the past observations, the estimated 
parameters and the residuals (ie. the estimates of the random shocks e^). It is then no problem to 
simulate Xt+i using the recursion given in formula ([T]). The new random shock St+i is generated 
as a normal random variate with mean zero and standard deviation a. The simulation of Xt-\-2 is 
done conditional on the past observations and on the generated values £t+i and Xt+i. 

For the case of SARIMA processes the model equation is again a linear combination of past 
observations and past random shocks together with the new random shock et- Due to the seasonal 
model some of the AR-parameters ((/>) and MA-parameters {9) are equal to zero. So again we can 
use the recursive approach explained above. 

In this code snippet we present the R routine we coded according to the above explanations. 
It generates future observations from seasonal and non-seasonal ARIMA processes conditional on 
an observed time series. 

Algorithm [T] summarizes how m future values are simulated from seasonal and non-seasonal 
ARIMA process. When we fit the ARIMA model using the arima() function of the R-base stats 
package, all the model parameters including the estimated variance of error terms (cr^) are returned 
as demonstrated in Section |4j R codes for Algorithm [T] are given in Section [5} 

Algorithm 1 simulating m future observations from seasonal and non-seasonal ARIMA process. 
1: If necessary do the differencing (seasonal and/or non-seasonal) of the data given in the fitted 

model (otherwise Vx simply refers to the original series) 
2: compute intercept /i = Va;(l — Yl^=i *^«) where Vx denotes for the average of Vx 
3: construct a vector x of size p + m 
4: construct a vector e of size q + m 
5: equate first p terms of x to Vx at newest p time steps 
6: equate (ejij , e[g]) to the residuals of data at newest q time steps 
7: for future time steps k = 1, ..,m do 
8: generate e[q+fc] from A^(0, a) 

9: apply moving average and auto-regressive filtering on xjp+fcj as in (Equation [Tj) 
10: end for 

11: remove first p elements of x to get only differences of future time steps 
12: undifference x 
13: return x 
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4 Numerical experiments 



In this section we first fit a seasonal model to monthly totals of international airline passengers 
between 1949 and 1960 using the arima(). (The data are available in the R package fma [2].) 

R> library ("fma") 

R> set.seed(4321) 

R> data <- airpass 

R> Par <- c(l, 1, 1, 0, 1, 0) 

R> fit <- arimaCdata, order = c(Par[l] , Par[2], Par[3]), 

+ seasonal = list (order = c(Par[4] , Par[5] , Par [6]))) 

R> fit 

Series: data 

ARIMAd, 1, 1) (0, 1, 0) [12] 
Coefficients : 

arl mal 
-0.3009 -0.0073 
s.e. 0.3835 0.4133 

sigma"2 estimated as 137: log likelihood = -508.2 
AIC = 1022.39 AICc = 1022.58 BIG = 1031.02 

For demonstration purposes we generate five different independent continuations of the time 
series and show them, their average and the forecasted values in Figure [T] 

R> sims <- arima. condsim(f it , data, n. ahead = 12, n = 5) 

R> tsl <- ts(sims[, 1] , f = frequency (data) , s = tsp(data) [2] + 

+ l/tsp(data) [3] ) 

R> ts2 <- ts(sims[, 2] , f = frequency (data) , s = tsp(data) [2] + 
+ l/tsp(data) [3] ) 

R> ts3 <- ts(sims[, 3] , f = frequency (data) , s = tsp(data) [2] + 
+ l/tsp(data) [3] ) 

R> ts4 <- ts(sims[, 4] , f = frequency (data) , s = tsp(data) [2] + 
+ l/tsp(data) [3] ) 

R> ts5 <- ts(sims[, 5] , f = frequency (data) , s = tsp(data) [2] + 
+ l/tsp(data) [3] ) 

R> tsA <- ts (sapply (seq_len(12) , function(i) mean(sims [i ,] ) ) , 

+ f = frequency (data) , s = tsp(data) [2] +l/tsp(data) [3] ) 

R> ts.plot(tsl, ts2, ts3, ts4, ts5, gpars = list(xlab = "1961", 

+ ylab = "Monthly international airline passengers", xaxt = "n")) 

R> lines(tsA, col="blue") 

R> lines (predict (fit, n.ahead=12)$pred, col="red") 

R> axisd, time(tsl), rep(substr (month. abb, 1, 1), length = length(tsl) ) ) 

In the following experiment we simulate 10,000 independent continuations and show that, as 
expected, the mean of the simulated values is very close to the forecasted values. It is also possible 
to use innovations equal to zero in our function arima. condsim() to produce the exact forecasts. 

R> sims <- arima. condsim (fit , data, n. ahead = 12, n = 10000) 
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Figure 1: Five different simulations and tlieir average (blue line) of the monthly international 
airline passengers in 1961 and forecasted values (red line). 



R> sims_mean <- sapply(seq_len(12) , function(i) mean(sims [i , ])) 
R> ts <- ts(sims_mean, f = frequency (data) , s = tsp(data) [2] + 
+ l/tsp(data) [3] ) 

R> ts 

Jan Feb Mar Apr May Jun 

1961 444.2828 418.1049 446.0237 487.9601 498.8899 562.0800 

Jul Aug Sep Oct Nov Dec 

1961 648.9706 633.0297 535.0563 487.9923 417.1746 459.2555 
R> predict (fit, n. ahead = 12) 

$pred 

Jan Feb Mar Apr May Jun 

1961 444.3670 418.2566 446.2898 488.2798 499.2828 562.2819 

Jul Aug Sep Oct Nov Dec 

1961 649.2822 633.2821 535.2821 488.2821 417.2821 459.2821 

Finally, we fit a non-seasonal model to the same data to show that our function works both for 
seasonal and non-seasonal models. 

R> Par <- c(l, 0, 1, 0, 0, 0) 

R> fit <- arima(data, order = c(Par[l] , Par[2], Par[3]), 

+ seasonal = list (order = c(Par[4] , Par[5] , Par [6]))) 

R> fit 

Series: data 

ARIMAd, 0, 1) with non-zero mean 
Coefficients : 

arl mal intercept 
0.9373 0.4264 281.5426 
s.e. 0.0302 0.0911 53.6135 



sigma"2 estimated as 968.5: log likelihood = -700.87 

AIC = 1409.75 AICc = 1410.04 BIG = 1421.63 

R> sims <- arima. condsim(f it , data, n. ahead = 12, n = 10000) 

R> sims_meaii <- sapply (seq_len(12) , function(i) meaii(sims [i , ])) 

R> ts <- ts(sims_mean, f = frequency (data) , s = tsp(data) [2] + 

+ l/tsp(data) [3]) 

R> ts 

Jan Feb Mar Apr May Jun 

1961 453.9091 443.5161 432.8683 422.7560 414.1958 406.3113 

Jul Aug Sep Oct Nov Dec 

1961 398.7037 391.8506 384.9362 378.4532 372.7470 367.1855 
R> predict (fit, n. ahead = 12) 

$pred 

Jan Feb Mar Apr May Jim 

1961 453.9038 443.0989 432.9713 423.4785 414.5809 406.2410 

Jul Aug Sep Oct Nov Dec 

1961 398.4239 391.0969 384.2292 377.7920 371.7583 366.1029 



5 Source code 

arima. condsim <- function (object , x, n. ahead = 1, n = 1){ 
L <- length (x) ; coef <- object$coef; 
arma <- object$arma; model <- object$model; 
p <- length (model$phi) ; q <- length (model$theta) 
d <- arma [6] ; s. period <- arma [5] ; 
s.diff <- arma [7] 

if(s.diff > && d > 0){ 
diff.xi <- 0; 

dx <- diff(data, lag = s. period, diff erences=s.diff ) 
diff.xi[l] <- dx[length(dx) - d + 1] ; 

dx <- diff(dx, differences = d) 

diff.xi <- c(diff .xi[l] , data[(L - s.diff * s. period + 1):L]) 
>else if (s.diff > 0){ 

dx <- diff (data, lag = s. period, differences = s.diff) 

diff.xi <- data[(L - s.diff * s. period + 1):L] 
>else if (d > 0){ 

dx <- diff (data, differences = d) ; 

diff.xi <- data[(L - d + 1):L] 
}else{dx <- data} 

use. constant <- is . element ("intercept" , names (coef)) 
mu <- 
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if (use . constant) { 

mu <- coef [sum(arina[l:4]) + 1] [[1]] * (1 - sum(model$phi) ) 

} 

p. startlndex <- length(dx) - p 
start . innov <- NULL 
if (q > 0){ 

start. innov <- residuals (object) [(L - q + 1) : (L)] 

} 

res <- array (0, c(n. ahead, n)) 
for(r in l:n){ 

innov = rnorm(n. ahead, sd = sqrt(object$sigma2)) 
if (q > 0){ 

e <- c (start . innov, innov) 
}else{e <- innov} 

xc <- array (0, dim = p + n. ahead) 

if (p != 0) for(i in l:p) xc [i] <- dx[ [p. start Index + i]] 
k <- 1 

for(i in (p + 1) : (p + n.ahead)){ 
xc[i] <- e[q + k] 
if(q != 0) 

xc[i] <- xc[i] + sum(model$theta * e[(q + k - l):k]) 
if(p != 0) 

xc[i] <- xc[i] + sum(model$phi * xc[(i - 1) : (i - p)]) 
if (use . constant) 

xc [i] <- xc [i] + mu 
k <- k + 1 

} 

xc <- as. vector (unlist(xc[(p + 1) : (p + n. ahead)])) 

if((d > 0) && (s.diff > 0)){ 

xc <- diffinv(xc, differences = d, xi = dif f .xi [1] ) [-c(l : d)] 

xc <- diffinv(xc, lag = s. period, differences = s.diff, 
xi = diff.xi [2: (s.diff * s. period + 1)]) 

xc <- xc [-(1 : (s .dif f * s. period))] 
>else if (s.diff > 0) { 

xc <- diffinv(xc, lag = s. period, differences = s.diff, 
xi = dif f .xi [1 : (s .dif f * s. period)]) 

xc <- xc [-(1 : (s .dif f * s. period))] 
>else if (d > 0){ 

xc <- diffinv(xc, differences = d, xi = diff .xi) [-c(l :d)] 

} 

res[, r] <- xc 
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} 

res 

} 



6 Discussion 

We have demonstrated that, using a SARIMA model, it is not difficult to simulate from the 
conditional distribution of future observations. Our code can thus be used to randomly generate 
possible future continuations of a time series. The identification of a suitable SARIMA model is an 
important step in the procedure we suggest; due to the nature of this short note we have to refer 
the reader to the vast literature on time scries analysis for this task; an important point in the 
modeling procedure are also checks for the model assumption. Especially the normal assumption 
for the error term has an important impact on the simulated future observations. 

Despite these important limitations, that arc present in all parametric statistical models, we 
hope that our simple algorithm will be useful for many applications. This seems likely as we were 
not able to find any suggestions for a similar algorithm in the literature. 
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